What factors affect loan’s amount ?
plt.figure(figsize=(22,9))
loan_data['Occupation'].value_counts().sort_index().plot.bar()
plt.ylabel('Count')
plt.title('Distribution of Occupation')
plt.show()
This graph shows that the most occured occupations are professionals teacher, executives, computer programmers
(loan_data['Occupation'].value_counts().tail(10).plot.bar())
plt.ylabel('Count')
plt.show()
#see the smallest counts of occupations
The least loans were taken by judges
#count plot for Distribution of Loan status
plt.figure(figsize=(13,9))
loan_data['LoanStatus'].value_counts().plot(kind='bar')
plt.ylabel('Count')
plt.title('Distribution of Loan status')
Text(0.5, 1.0, 'Distribution of Loan status')
We see that most of the loans are current they are not closed yet. so plus the second highest loans are the ones which have been completed.
plt.figure(figsize=(13,9))
ax = sns.countplot(x="IncomeRange", data=loan_data)
plt.title('Distribution of Income Rnage')
Text(0.5, 1.0, 'Distribution of Income Rnage')
Here we see that the people taking loans mostly the range of their income is between 25-49000$ the greater their income the less loan they take. Poeple who do not work are almost not given loans.
plt.figure(figsize=(22,10))
sns.distplot(loan_data["MonthlyLoanPayment"])
plt.title('Monthly Loan Payment')
plt.ylabel('Count')
plt.show()
We see that the distribution is normal and right skewed. It has one evident peak around 130. The range is between 0 to 1000.
sns.distplot(loan_data['LoanOriginalAmount'], kde=False, rug=True)
plt.ylabel('Count')
plt.title('Distribution of LoanOriginalAmount')
plt.show()
sns.distplot(loan_data['LoanOriginalAmount'], rug=True)
plt.ylabel('Count')
plt.show()
It has several peaks
sns.distplot(loan_data['start_year'], rug=True)
plt.ylabel("Count")
plt.show()
The most loans were given in 2013. The loans amount were given more by more after each year till 2014 when they have
sns.distplot(loan_data['start_month'], rug=True)
plt.show()
The most loans were given in january and october and the least in April
sns.distplot(loan_data['BorrowerRate'], rug=True)
plt.ylabel("Count")
plt.show()
This is a normal distribution we can see that the most values lie around the 0.15 and there is a peak between 0.30 to 0.34.
plt.figure(figsize=(50,20))
count_alpha= loan_data["ProsperRating (Alpha)"].value_counts()
label=["Unknown","C","B","A","D","E","HR","AA"]
fig1, ax1 = plt.subplots()
plt.figure(figsize=(20,20))
ax1.pie(count_alpha, autopct='%1.1f%%',shadow=True, startangle=90,labels=label)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
<Figure size 3600x1440 with 0 Axes>
<Figure size 1440x1440 with 0 Axes>
What comes to the prosper rating we see that unkown rates have the most share. The second biggest proportion loans have been given a rate of 'C', and then 'B'. 'AA' has been given to very few loans only to 5.1%
plt.figure(figsize = [13, 10])
sns.heatmap(loan_data.corr(), annot = True, fmt = '.3f', cmap = 'vlag_r', center = 0)
plt.title("Corelation Matrix")
plt.show()
This shows that Loan original amount is correlated onegatively to borrower apr and postively to monthly loan payment and to investors
sns.FacetGrid(loan_data,size=5).map(plt.scatter,"LoanOriginalAmount",'MonthlyLoanPayment').add_legend()
plt.title("Loan Original Amount vs Monthly Loan Payment")
plt.show()
D:\anaconda\lib\site-packages\seaborn\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code. warnings.warn(msg, UserWarning)
Loan amount and montly loan payments are positievly related.
datalim=loan_data[:500] # to have less data points
plt.scatter(datalim["LoanOriginalAmount"],datalim["Investors"])
plt.ylabel('Investors')
plt.xlabel('LoanOriginalAmount')
plt.title('Investors vsLoanOriginalAmount ')
plt.show()
30000 and 35000 loan amounts are outliers other than that these two variables are positevely correlated though not so much
#cross plots
sns.pairplot(loan_data, diag_kind='kde');
def boxgrid(x, y, **kwargs):
""" Quick hack for creating box plots with seaborn's PairGrid. """
default_color = sns.color_palette()[0]
sns.boxplot(x, y, color = default_color)
plt.figure(figsize = [10, 10])
g = sns.PairGrid(data = loan_data, y_vars = ['BorrowerAPR'],
x_vars = ['Term','ProsperRating (Alpha)', 'EmploymentStatus'], height = 4, aspect = 2)
g.map(boxgrid)
plt.title("BOX PLOT of BorrowerAPR,Term, ProsperRating (Alpha) ")
plt.show();
<Figure size 720x720 with 0 Axes>
plt.figure(figsize = [10, 10])
g = sns.PairGrid(data = loan_data, y_vars = ['LoanOriginalAmount'],
x_vars = ['Term','ProsperRating (Alpha)', 'EmploymentStatus'], height = 4, aspect = 2)
g.map(boxgrid)
plt.title('LoanOriginalAmount vs Term ,ProsperRating (Alpha), EmploymentStatus')
plt.show();
<Figure size 720x720 with 0 Axes>
plt.figure(figsize=[10, 10])
sb.countplot(y='LoanStatus', hue='Term', data=loan_data)
plt.title('LoanStatus by hue')
Text(0.5, 1.0, 'LoanStatus by hue')
So the loan for 36 months has mostly current status and secondly it has completed status. 60 months loans are having mostly current status and the second highest number of 60 months loans has been completed already
plt.figure(figsize=[10, 10])
plt.title('LoanStatus by start year')
sb.countplot(y='LoanStatus', hue='start_year', data=loan_data)
<matplotlib.axes._subplots.AxesSubplot at 0x200e13e4a48>
Current Loans have been mostly given in the year of the 2013.And for the completed loans they have been given mostly in 2008.
plt.figure(figsize=[10, 10])
plt.title('loan status by ProsperRating (Alpha) ')
sb.countplot(y='LoanStatus', hue='ProsperRating (Alpha)', data=loan_data)
<matplotlib.axes._subplots.AxesSubplot at 0x200e13dbcc8>
Most of the current loans were given a C rating and secondly a B rating.For the completed loans most of the ratings are unknown, secondly they have been given D rating.
plt.figure(figsize = [20, 20])
plt.scatter(data = loan_data, x = 'LoanStatus', y = 'LoanOriginalAmount', alpha = 1/10)
plt.title('Loan Status vs Loan Original Amount')
plt.show()
Current has larger loan amount >35K.most of the loans which are defaulted have loan amount >25K; I assume that the current loan given have higher amounts than the completed ones.
plt.figure(figsize=[10, 10])
sb.barplot(y='LoanOriginalAmount', x='ProsperRating (Alpha)', data=loan_data)
plt.title('Loan Original Amount vs ProsperRating (Alpha)')
plt.show()
The maximum loan amounts were given to A B and to AA. HR and E have the lowest amount
plt.figure(figsize = [10, 10])
plt.scatter(data = loan_data, y = 'AvailableBankcardCredit', x = 'LoanOriginalAmount')
plt.ylabel('Available Bankcard Credit')
plt.xlabel("Loan Original Amount")
plt.title('AvailableBankcardCredit vs LoanOriginalAmount')
plt.show()
The people who have less credit amount take more loans.
plt.figure(figsize = [10, 10])
plt.scatter(data = loan_data, y = 'BorrowerAPR', x = 'LoanOriginalAmount')
plt.ylabel("Borrower APR")
plt.xlabel("Loan Original Amount")
plt.title('BorrowerAPR vs LoanOriginalAmount')
plt.show()
The highest APR is for the lowest loan amount
plt.figure(figsize = [10, 10])
plt.scatter(data = loan_data, y = 'BorrowerRate', x = 'LoanOriginalAmount')
plt.ylabel("Borrower APR")
plt.xlabel("Loan Original Amount")
plt.title('BorrowerRate vs LoanOriginalAmount')
plt.show()
There is a similar impact of the BorrowerRate and borrowerAPR on the loan amount. They both are higher when the loan amount is lower.
plt.figure(figsize = [15, 8])
# subplot 1: color vs cut
plt.title('EmploymentStatus by LoanStatus')
sb.countplot(data = loan_data, x = 'EmploymentStatus', hue = 'LoanStatus', palette = 'Reds')
<matplotlib.axes._subplots.AxesSubplot at 0x200e6201a48>
Most of the Emplyed people have current loan status and the second largest staus is the completed. So we can conclude that Employed people have less non paid loans. Most of the loans completed are by full_time eployees. the other were given less loans or not given at all.
plt.figure(figsize = [15, 8])
plt.title('LoanOriginalAmount vs EmploymentStatus')
# subplot 1: color vs cut
sb.barplot(data = loan_data, y = 'LoanOriginalAmount', x = 'EmploymentStatus')
plt.show()
The people who are employed get the highest loan amounts. Self employeed people get the second highest loan amounts. Part time workers get the lowest loans still I am surprised that not employeed people get more loans than part-time workers.
sns.FacetGrid(loan_data,hue="LoanStatus",size=5).map(plt.scatter,"LoanOriginalAmount",'IncomeRange').add_legend()
plt.title('LoanOriginalAmount vs IncomeRange by LoanStatus')
plt.show()
D:\anaconda\lib\site-packages\seaborn\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code. warnings.warn(msg, UserWarning)
g = sb.FacetGrid(data = loan_data, col = 'Term',palette = 'colorblind',size=10)
g.map(sb.regplot, 'EmploymentStatusDuration', 'LoanOriginalAmount',x_jitter=0.04, scatter_kws={'alpha':0.1})
plt.title('EmploymentStatusDuration vs LoanOriginalAmount by Term')
plt.show()
g = sb.FacetGrid(data = df, hue = 'EmploymentStatus', size = 10,
palette = 'colorblind',aspect=2)
g.map(plt.scatter, 'EmploymentStatusDuration', 'LoanOriginalAmount')
plt.title('EmploymentStatusDuration vs LoanOriginalAmount by EmploymentStatus')
g.add_legend()
D:\anaconda\lib\site-packages\seaborn\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code. warnings.warn(msg, UserWarning)
<seaborn.axisgrid.FacetGrid at 0x200ed648908>
plt.figure(figsize = [30, 10])
plt.title('LoanStatus vs LoanOriginalAmount by Term')
ax = sb.pointplot(data = loan_data, x = 'LoanStatus', y = 'LoanOriginalAmount', hue = 'Term',
dodge = 0.3, linestyles = "")
plt.figure(figsize = [30, 10])
ax = sb.barplot(data = loan_data, x = 'EmploymentStatus', y = 'LoanOriginalAmount', hue = 'Term')
ax.legend(loc = 8, ncol = 3, framealpha = 1, title = 'LoanOriginalAmount')
plt.title('EmploymentStatus vs LoanOriginalAmount by Term')
plt.show()